SUPPORT / SAMPLES & SAS NOTES
 

Support

Problem Note 62873: The "Missing Values and Imputed Values" example in the chapter The HPFOREST documentation gives incorrect information

DetailsAboutRate It

In SAS® Enterprise Miner™ High-Performance Procedures documentation, the chapter The HPFOREST Procedure contains an example called "Missing Values and Imputed Values".  The stated purpose of the example is as-follows:

This example uses the Home Equity data from the SAS sample library to illustrate the difference between using missing values and using imputed values.

 

However, the conclusion that a difference exists is based on a false premise.  You can ignore the example.

 

 

DETAILS

 

proc hpimpute data=sampsio.hmeq out=imout;
   input mortdue value yoj clage ninq clno debtinc derog delinq;
   impute mortdue value yoj clage ninq clno debtinc derog delinq/method=mean;
run;
data job_reason;
   set sampsio.hmeq;
   if job='' then job="Other";
   if reason='' then reason="DebtCon";
run;
data imout;
   merge imout job_reason;
run;
 

The DATA step code that creates the imputed table imout does not output the correct imputed values for those variables that were imputed by the PROC HPIMPUTE invocation.  The incorrect values occur because the OUT= data set that is created by PROC IMPUTE does not preserve the data order of the DATA= data set when the procedure is running in multi-threaded (the default) mode.  There is no BY statement with the MERGE statement in the DATA step.  Therefore, the merged observations are not correctly matched.

The only way to guarantee data order is by using a PERFORMANCE statement with the NTHREADS=1 option in the PROC HPIMPUTE invocation.

If the MERGE is done correctly, then the variable-importance ranking and miss-classification rates are similar with and without imputation.  In that case, the premise "imputing variables reduce the predictive power of the variables" is no longer valid.

 

 



Operating System and Release Information

Product FamilyProductSystemProduct ReleaseSAS Release
ReportedFixed*ReportedFixed*
SAS SystemSAS High-Performance Data MiningMicrosoft® Windows® for x6412.215.19.3 TS1M29.4 TS1M6
64-bit Enabled AIX12.215.19.3 TS1M29.4 TS1M6
64-bit Enabled Solaris12.215.19.3 TS1M29.4 TS1M6
Linux for x6412.215.19.3 TS1M29.4 TS1M6
Solaris for x6412.215.19.3 TS1M29.4 TS1M6
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.